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Abstract — A Bayesian data-analysis framework is described 
for atmospheric and surface retrievals from remotely-sensed 
hyper-spectral data. Some computational techniques are high- 
lighted for improved accuracy in the forward physics model. 

I. Introduction 

In the Earth-Science (ES) community, work has been done 
for decades on the retrieval of atmospheric and surface 
properties from flux and radiance measurements gathered 
by dedicated sensors, ranging from multi-spectral passive 
instruments to active devices such as microwave radar and 
lidar. These instruments have traditionally been designed to 
operate in certain spectral “windows” where components of 
interest have well-known and isolated signatures; meanwhile, 
theorists were able to develop ingenuous approximations to the 
associated forward radiative-transfer models. This combination 
of “smart” observations and “simplified” physics allowed 
the use of relatively simple and computationally affordable 
retrieval algorithms in remote sensing. Although some loss of 
accuracy was inevitably incurred, the computational resources 
of the past decades simply could not handle the problem 
in a more direct way. This, of course, is no longer true; 
yet, these “legacy” techniques are still deeply ingrained in 
the atmospheric radiative-transfer community, and continue to 
artificially limit our retrieval capabilities. 

The situation is exacerbated by the rapid and largely unco- 
ordinated growth in the number and variety of remote- sensing 
instruments operated by a multitude of institutions around 
the globe. The somewhat independent characterization and 
deployment of these instruments, and the subsequent, rather 
disconnected analyses of their data sets have led to serious 
cross-validation issues that remain largely unresolved. There- 
fore, the establishment of a unified framework for instrument 
characterization and data analysis is crucial if we are to utilize 
the data volume of our ever-growing “sensor web” in the most 
efficient and accurate manner. Given the potential impact on 
society, as well as the policy-makers, of our measurements 
and predictions, our retrievals must truly represent “the best 
estimate of the scientific community” 

The trend in Earth observation has been shifting recently 
towards passive hyper- spectral imaging instruments. With hun- 
dreds of spatial pixels and hundreds of spectral channels, 
these sensors observe the terrestrial atmosphere and surface 
in space, time, and wavelength, thus producing complete 


“snapshots” of the dynamical Earth system in the form of “data 
cubes ” As such, they provide vital information for assessing 
global changes in cloud and aerosol properties, abundance of 
atmospheric water vapor, ozone, and carbon dioxide, sea and 
land surface temperatures and albedos, etc . Such sensors can 
simultaneously satisfy the needs of many communities that 
have heretofore felt compelled to build their own expensive 
custom instruments for every new mission. 

The potential for simultaneous measurement and retrieval of 
a large set of atmospheric and surface properties by a single 
(type of) instrument also brings with it the need for more 
comprehensive and accurate forward models. For if an entire 
data cube is to be utilized for its information content, then the 
entire array of physical processes that might have influenced 
the data must be efficiently accounted for. Currently, too many 
investigators are continuing to use rudimentary techniques 
such as band ratios in analyzing hyper- spectral data. We 
believe that our computational power is now ready for more 
accurate implementations of the forward physics as well as 
more rigorous retrieval algorithms. Under the Intelligent Data 
Understanding Project of the NASA CICT Program, we have 
been developing a Bayesian data-analysis framework that is 
expected to greatly enhance our understanding of, as well 
as our ability to model and predict, the Earth system. In 
this paper, we describe the salient features of this framework 
along with some details of the forward physics model where 
improvements have been made to the state of the practice. 

II. Retrieval Theory 

We begin with the basic statement of our problem. De- 
note the unknown parameters underlying an experiment as 
x = { x n , n = 1,2, . and the measurements made 

during the experiment as y = {y m , m — 1,2,..., M }. The 
“standard” (static) measurement model relating x and y is of 
the form 

y = f(x) + £, (1) 

where the vector-valued function f : M. N ^ R M represents 
the typically nonlinear “forward” physics model for the experi- 
ment, and e = {e m , m = 1, 2, . . . , M} denotes the ubiquitous 
measurement error, comprising the systematic bias as well as 
the stochastic noise of the measuring instrument. The “inverse” 
retrieval problem, then, is that of estimating the state vector x 
from a given measurement vector y [1], [2]. 



A. Bayesian inference 

In the Bayesian philosophy [3], [4], one’s knowledge of a 
quantity z is represented by an associated probability density 
function (PDF) p(x)\ the sharper p{x) is around some value 
x *, the more confident we are that x ~ x* in actuality (see 
Figure 1). For the experimental scenario described above, 
Bayes’ theorem asserts that [3]-[6] 

= ^(yl x ) p(x) 

P( y) f£(ylx)p(x)dx' 

Here, p(x) and p(x|y) are, respectively, the a priori and a 
posteriori PDFs - the prior and the posterior - of the state 
vector x, and ^(y|x) is the likelihood of the measurement 
vector y. The form of the likelihood function represents the 
solution of the underlying modeling problem. The prior, mean- 
while, encapsulates our initial “best guess” and associated un- 
certainty about the unknown parameters. Upon measurement, 
this prior is transformed into the posterior via Bayes’ theorem; 
a sharper posterior (relative to the prior) indicates an improved 
confidence on our part as to the value of x after having seen 
the data y. 

A decision strategy is needed next to extract from p(x|y) an 
optimal estimate for x. We adopt the maximum a posteriori 
estimate x* given by the mode of the posterior ( i.e that value 
of x for which the posterior is maximized). Defining a suitable 
“cost” function 1 


we therefore seek the global minimum of J: 

J(x*)<J(x) Vx^x*. (4) 

Finally, the “width” of the posterior, however it is defined, 
provides a quantitative measure of uncertainty - an “error 
cloud” ( Le ., a multi-dimensional error bar) at the tip of the 
estimated state vector x*. This width tends to shrink with 
increasing number of measurements, a phenomenon usually 
referred to as “Bayesian learning.” Expanding the log-posterior 
into a Taylor series around x*, we obtain 

lnp(x|y) = lnp(x*|y) - \ (x - x‘) T S* -1 (x - x*) + . . . . 

(5) 

Note that the first-order term is absent by virtue of (4). The 
eigenvalues and eigenvectors of the matrix S* so defined serve 
naturally as the desired measure of uncertainty around x*. 

B. Gaussian case 

It is generally acceptable to model the statistics of the mea- 
surement error with a Gaussian PDF; thus, e ~ Af(5 , £), with 
J\f signifying the normal distribution. Here, the mean vector 
S represents the average offset of the instrument while the 
covariance matrix E characterizes its statistical fluctuations; 
see §m-A. Consequently, one has the likelihood function 

e(y\x) oc e~5 [y-r(*)-«] T E _1 [y-f(*)-i] (6) 


J(x) = -ln£(y|x) -lnp(x), (3) 

1 Any monotonically non-increasing function of the posterior would do here; 
our particular choice has the advantage of turning the multiplicative form of 
(2) into the more convenient additive form of (3), as well as of dispensing with 
the term p(y), which does not depend on x and therefore has no bearing on 
the optimal estimate. Further simplifications result from the use of the natural 
logarithm if the exponential family of PDFs are adopted for the likelihood 
and the prior; see §II-B. 
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Fig. 1 . The Bayesian representation of knowledge. The uniform PDF (a) 
indicates complete uncertainty about x, while the sharp PDF (c) reflects a high 
degree of certainty that the value of x is close to the position of the peak, 
x *. The broad PDF {b) is an intermediate case: the position of the peak may 
still be taken as a reasonable guess for x; however, there is now considerable 
uncertainty about this estimate, signified by the much larger spread of the 
distribution around its peak. 


It is also common, albeit more questionable, to adopt a 
Gaussian PDF for the prior: 

p(x) oc e ~2 ( X ~G T >=‘ 1 (*■-€)> (7) 

where the mean vector £ and the covariance matrix S re- 
spectively represent one’s best guess for and the associated 
uncertainty about x before the experiment is conducted. 

Substituting (6) and (7) into (3), we obtain the cost function 

r( x ) = \ [y - f( x ) - <*] T S_1 [y - f ( x ) - <*] (8) 

+ \ (x-0 T s- 1 (x-£) + C, 

where C is an unimportant constant. Several methods are avail- 
able for minimizing such a cost function. The simplest of these 
(i e.g simulated annealing, steepest descent) are also the most 
inefficient, whereas the use of sophisticated methods (e.g., 
conjugate gradients, quasi-Newton) may be a computational 
overkill if the optimization problem at hand is only mildly 
difficult. We will describe in §II-C an elegant and effective 
approach that is the method of choice for cost functions of 
the form (8) with moderately nonlinear forward models. 

We note in passing that the use of Gaussian PDFs is 
typically justified on “holy grounds” such as the central-limit 
theorem [5] or the maximum- entropy principle [6]; the real 
reason for their use in many investigations is simply the 
ensuing mathematical tractability of the analysis, and too often 
this is done without careful attention being paid to the limits 
of validity of this assumption. It is clear from (8), however, 
that in the case of a nonlinear forward model, the assumption 


of Gaussianity does not yield a simple (i.e., quadratic) cost 
function. There is therefore no reason to shy away from using 
non- Gaussian likelihood functions and priors that may more 
faithfully represent the experimental reality. 

C. Optimization approach 

The assumption of moderate nonlinearity suggests that one 
can linearize the forward model around an “operating point” 
xq as 

f(x) ~ f(xo) + K(xo) (x - Xo), (9) 

where the M x N Jacobian matrix is defined as 

K(x) = Vf(x) = |^^J. (10) 

Substituting (9) into (8) and taking the gradient, we find 

Vj(x) = S- 1 (x-^)-K(x 0 ) T S- 1 (11) 

■ [y - f(x„) - K(xo) (x - Xo) - S] , 

where J(x) is the locally linearized cost function. Setting this 
expression to 0 and solving for x yields the local minimum, 
which may then be used as a new operating point x*. Iterating 
this procedure, we obtain the “update” equation 

Xfc+i =x*-(KjS- 1 K fc +S- 1 )" 1 (12) 

• [S-^Xfc-O-KifE-’fy-* ’*-*)], 

where f k = f (x fc ) and K k = K(x*) for k = 0, 1, . . .. At the 
k * iteration, the covariance matrix is given by 

S k = (Kjll- 1 K k + 3- 1 )- 1 - ( 13 ) 

Note that the covariance updates are not iterative, but are 
simply generated along the way in accordance with (5). For 
convenience, the iteration may be started with the initial guess 
xo = at convergence, we have x ~ A/^x*, S*), assuming 
that S* does not extend beyond the neighborhood of local 
linearity around x*. 

The above optimization scheme constitutes the Gauss- 

Newton method [2]; it makes partial use of the Hessian 
information, and may therefore be regarded as of “order l| ” 
Despite the regularizing effect of the prior, the iteration step 
size - the second term on the right-hand side of (12) - 
may still be large enough to push x* +1 outside of the local 
neighborhood of x fc within which (9) holds. This violates 
the basic premise of the method, and the iteration becomes 
unstable. We avert this situation by adaptively penalizing large 
steps via the Levenberg-Marquardt strategy [7]. 

Finally, a convergence criterion is needed so that (12) is 
not iterated ad infinitum. The logical choice is to monitor 
the step size at each iteration; of course, the step sizes in 
different directions must be scaled appropriately to provide a 
sensible measure of change between consecutive updates. In 
addition, it is wise to make sure that the associated change in 


the cost function is also appropriately small. Thus, we declare 
convergence when 

(x fc+ 1 - X*) T Sfc 1 (x* +1 - x fc ) < N (14) 

and 

| J(x fc+1 ) - J(x*)| < M. (15) 

While the actual evaluation of the Hessian is generally 
avoided by all methods as it is deemed too costly computa- 
tionally, the use of accurate gradient information is extremely 
crucial for successful optimization. Traditionally, the Jacobians 
(10) have been evaluated by differentiating the forward model 
approximately as 

TyT fm( x n Ax n ) — fm( x n) 

N-mn — • ( 10 ) 

This not only requires (at least) MN additional evaluations of 
the forward model, which is very prohibitive for the typical M 
and N values encountered in ES data analysis, but also leads to 
rather inaccurate Jacobians due to the inevitable arbitrariness 
in the choice of the “external” perturbation sizes Ax n . It is 
being recognized by more and more investigators that the most 
efficient way to generate Jacobians is by an “internal” pertur- 
bation of the forward model. In this approach, straightforward 
application of the chain rule of differentiation permits one, 
for a relatively minor computational overhead, to calculate 
Jacobians within the same scheme used for the forward-model 
calculation. In fact, this consideration is so critical for the 
retrieval method that the numerical approach for forward- 
model implementation should be chosen on the basis of the 
availability of a thorough internal perturbation analysis. Our 
method of choice for forward model and Jacobian evaluations 
is described in §IH-C. 

D. Retrieval framework for Earth-Science data analysis 

It is of interest to generalize (2) to a practical situation where 
multiple measurements y a (each of length M a ) are available 
for processing, and multiple unknowns x 6 (each of length N^) 
are wished to be retrieved. In ES retrieval problems, the index b 
may run over atmospheric temperature, pressure, gases, clouds, 
and aerosols, surface temperature, albedo, and bi-directional 
reflectance distribution, etc., with the corresponding state vec- 
tors representing suitable discrete parameterizations of these 
physical variables. Meanwhile, the index a may run over 
measurements obtained by different (remote and/or in situ ) 
instruments, or over multiple measurements made by a single 
instrument (or both). In any case, the essential requirement 
is that the various measurements be nearly simultaneous in 
space and time, such that the underlying physical state of 
the atmosphere and the surface may be assumed to remain 
unchanged during the span of these measurements. 2 

2 In the case of hyper- spectral image processing, there are significant 
benefits to be accrued from a spatially-varying representation for the state 
vectors; however, limited space does not permit further elaboration here. 
Allowing for full spatio-temporal evolution of the state vectors over the course 
of the experiment takes us into the vast realm of global data assimilation and 
circulation models [8], into which we also will not venture. 


For compactness, we collect the various measurement and 
state vectors into sets Y = {y a } (data) and X = {x b } 
(unknowns). Based on the above description, it is realistic to 
expect that the different measurement vectors will be mutually 
statistically independent, and likewise for the different state 
vectors. Thus, the appropriate generalization of (2) is 


P(X|Y) = 


l(Y|X)p(X) 
P( V) 


^k) IIp^) 

' a b 


(17) 

and with Gaussian likelihoods and priors, the corresponding 
cost function becomes 


J(X) = - ln*(Y|X) - lnp(X) (18) 

= \ £ [y- - W - «-] T [y° - f aW - s a \ 

a 

+ 9 5Z ( Xfc “ 1 ( X6 “ €l>) + C' ' 

1 b 

We can now proceed with the Gauss-Newton approach toward 
minimizing this full cost function to obtain optimal estimates 
and error bars for the unknown quantities. This, then, consti- 
tutes the Bayesian calculus of inference - a framework for 
probabilistic reasoning within which data from any number 
heterogeneous sources are systematically combined with any 
type of prior information to solve the retrieval problem. Given 
the ongoing explosion in ES data volume, this appears to be the 
only approach that can efficiently assimilate and meaningfully 
analyze the diverse information gathered by the distributed 
agents of a sensor web. Several points may be made in 
connection with this formulation: 

• In the Bayesian setting, the underlying physical model 
of the observed phenomenon, and not the data collected 
from it, is regarded as the item of central importance; 
working in this “model space” enables seamless inte- 
gration of data from any number of in situ and remote 
(ground as wed as orbital) sensors. 

• Measurements and priors with lesser uncertainty, as quan- 
tified by the matrices £ 0 and S&, will play a proportion- 
ately greater role in determining the position of the global 
minimum of J (i.e. y the optimal estimate). 

• Naturally, different measurements y 0 will have varying 
sensitivities to different states x*,; this will be reflected 
(to first order) by the M a x ^2 b N b Jacobian matrices 
K«(X) = Vf a (X). 

• One does not need a separate forward model for each 
data stream, but only for each type of measurement; for 
instance, if all the measurements are made by radiation 
sensors, then a single radiative-transfer calculation may 
suffice as the forward model for the entire data set. 

• Individual measurements contribute to the cost function 
additively in (18), which allows for the processing of 
“streaming” data (as opposed to batch processing of the 
entire data set after it has been acquired); with each new 
measurement, the optimal estimate and its covariance can 
be updated through (12) and (13). 


• Since parametric uncertainties can be assessed along 
with the best estimates, data acquisition may be steered 
automatically, and ceased when sufficient confidence in 
the inferred model parameters is attained, thus preventing 
the data volume from growing unnecessarily. 

• The Bayesian approach thus utilizes the solution of the 
conceptually easier forward modeling problem to solve 
the more difficult, and arguably more interesting, inverse 
inference problem; domain knowledge and expertise enter 
naturally into the solution through the priors and the 
likelihood functions. 

III. Forward Models 

Consider an atmosphere composed of a mixture of gases 
indexed by i and bounded from below by a (partially) re- 
flecting surface. We shall take this medium to be static and 
plane-parallel (i.e. y a horizontally homogeneous slab) within 
the spatio-temporal span of our measurements. The physical 
state of the medium is then fully characterized by the vertical 
profiles of temperature T(z) y pressure P(z), and molecu- 
lar volume number densities Ni{z) y along with a spectral 
reflectance function p(A;s;s') for the surface. We wish to 
retrieve these physical properties from remote measurements 
of the spectral intensity (or radiance) J*(r,s). 

It must be acknowledged, however, that during the act of 
measuring a physical signal, a sensor inevitably imparts its 
own “signature” by altering the signal properties in some 
unique fashion; in this sense, the measurement device consti- 
tutes an integral part of any physical data-collection process. In 
order to achieve the most accurate retrieval of the atmospheric 
and surface properties of interest, it is therefore necessary to 
incorporate a detailed physical model of the sensor in series 
with one for the atmosphere-surface system. We thus require 
two forward models in our data-analysis effort. 

A. The sensor 

We specialize to an hyper- spectral instrument that samples 
the radiation spectrum uniformly over some wavelength range 
(Amin, Amax) with a resolution (or channel spacing) A A and a 
total number of channels (or bands) M = (Ama X — A m j n )/AA. 3 
With the sensor at position r and pointing in direction — n, the 
output of the channel may be written in the form 

ym = 0m J V»(A m -A) J x(ns)I A (r,s;X)dndA 

"F ^ m "F 7?m- (19) 

The instrument “slit function” ip(\) has been assumed iden- 
tical for all channels, and the “antenna pattern” x(^) has been 
taken independent of wavelength, though these can obviously 
be kept more general; typically, these functions are known 
from calibration experiments. On the other hand, A m , /? m , 
<5 m , and r? m respectively denote the channel center wavelength 

3 The term “hyper- spectral” implies that M is of the order of a few hundred 
or more; however, the ensuing discussion applies equally well to multi-spectral 
instruments with a few to a few tens of (not necessarily contiguous) channels, 
as well as to (essentially single-channel) active instruments. 




Fig. 2. The composite forward problem. The solar beam with spectral flux 
density F? is incident on the terrestrial atmosphere composed of gases, 
clouds, and aerosols; in this paper, we consider an atmosphere composed only 
of molecules. Through absoiption, scattering, and reflection, the atmospheric 
and surface constituents impart their spectral signatures onto the up-welling 
spectral intensity if. This is, in mm, detected by an hyper-spectral instrument 
on a satellite platform, which produces the data stream {y m }. The red and 
blue double arrows respectively designate the atmosphere-surface and the 
instrument forward models. (The spectra are stylized for illustration.) 


(determined by the optics), gain (set by the electronics), offset, 
and noise. Specifically, S m are the mean of the optical and 
electrical noise sources in the sensor, such as the photodiode 
dark noise and the (optical) background and inter-pixel cross- 
talk noise. Meanwhile, r? m model the fluctuations associated 
with these noise sources, as well as the thermal noise of the 
electronics; they are taken to be zero-mean Gaussian random 
variables with covariance Emm' = Together, 

and 7? m comprise the measurement error e m of §11. We assume 
that all these sensor parameters have also been adequately 
characterized via suitable calibration experiments prior to 
instrument deployment; if this is not the case, they can be 
arranged into an instrument state vector Xj ns and included in 
the set X of unknowns to be retrieved from data. 

Finally, the spectral intensity Tx(r,s;X) incident on the 
sensor is a nontrivial function of the atmospheric and surface 
states comprising X; this part of the forward model will be 
developed in §IH-B . The first term in (19) thus constitutes 
the “composite” (i.e., atmosphere-surface plus instrument) 
forward model / m (X) of §11 (see Figure 2). 

The somewhat phenomenological equation (19) is the type 
of channel model commonly employed in the literature; how- 
ever, a rigorous signal/noise analysis of a generic electro-optic 
sensor reveals that this is merely an approximation to the full 
probabilistic model of such an instrument [9]. As is echoed at 
many recent technical meetings, currently one of the biggest 
sources of uncertainty in remote sensing is the poor sensor 
calibration standards and practices. It is therefore imperative 
that one verifies the validity of the form of (19) before using 
it as the basis of any calibration and/or retrieval algorithms. 


B. The atmosphere-surface system 

In a plane-parallel atmosphere, the equation of solar radia- 
tive transfer for the spectral intensity is f 1 0] [ 1 2] 

dl 

cos 6 = -k( v\ z) I v (z, 0 , <j>) + <j{v\ z) (20) 

oz 

• f f p(6, <f>] O' , (j>) Iv(z, O' , (f)') sin 6' d&' d<f>' , 
Jo Jo 

where 0 and <j> are the usual polar and azimuth angles (see 
Figure 3), k(i/; z) = cr(v] z) + a(v, z), with «, cr, and a 
respectively denoting the extinction, scattering, and absorption 
coefficients of the medium, and we switched to the frequency 
variable v — c/X more commonly used in spectroscopy. For a 
molecular atmosphere, the scattering phase function is given 
by p(0) = (1 + cos 2 ©), where 0 denotes the (solid) 

angle of scattering between directions (0\(p') and (0,<^>). The 
scattering and extinction coefficients, meanwhile, are given in 
terms of the molecular concentrations as 



( 21 ) 


For a molecule irradiated by an unpolarized plane wave, the 
scattering and extinction cross sections are given by [ 1 3]-[ 1 5] 

C*x(v) = 2^2 |7f<(i/)| 2 , (22) 

CextO) = — 97^0), (23) 

Co 

where k = ‘hxvlc, c and 6 0 are the speed of light and 
the dielectric permittivity in vacuo , and 7Tj(i/) denotes the 
molecular polarizability of the z th gas. 

The problem description is made complete by the specifica- 
tion of boundary conditions that the intensity must obey at the 


z 



Fig. 3. The geometry of the radiative transfer problem, (a) An arbitrary 
ray at position r propagating in direction s. ( b ) The incident and reflected 
ray angles for the surface bi-directional reflectance distribution function 
p(v\ 6, <p; 6', <t>’). (c) The incident solar ray. For propagation into the lower 
hemisphere, the polar angles are measured from the — z axis. 



top of the atmosphere (z — H) and at the surface (z ~ 0): C. Computational details 


K (H, Mi 4>) — F? S(p — fi Q ) 6(<f> — 0®), (24) 

7+(0, <t>) = f J p(y ; p, <fi; fx , <j>') (25) 


where + /~ indicate up-/down-welling intensities, ( 0 0 ,<£°) 
specify the direction of the incident solar beam, p = cos 0 , 
p' = cos0', and p° = cos # 0 (see Figure 3). In our work, 
we have so far used the Kurucz model for the solar spectral 
flux F 0 ; satellite measurements of F 0 have recently become 
available and should be used to remove from data the influence 
of solar variability. Meanwhile, (25) clearly demonstrates the 
need for an accurate surface model even if one is interested in 
retrieving only the atmospheric properties. When the surface is 
not known or only poorly characterized, a suitable parametric 
model should be adopted for p and a corresponding state vector 
Xs Ur added to the set X of unknowns. 

In the spectroscopy literature, the extinction coefficient is 
typically written in the form 


k{v\ z) = £*,(*) J2 Si i(z) fWiVijWnijiz ),; *,•(*)], 

* 3 

(26) 

where j is the resonance (or line) index, and / denotes the 
gas extinction line shape. For a line centered at frequency u 0 
with a collision-broadened width 7 and a motion-broadened 
width 7 , the Voigt profile is given by the convolution integral 
[ 11 ], [ 12 ] 


fiy\v 0,7.7) 


2 1 f°° 

7T v/7T7 J _ 00 ( i - - 1/0 - w) 2 + 7 2 


(27) 

consisting in the folding of Lorentzian and Gaussian line 
shapes, respectively dictated by the impact theory and the 
Doppler effect. 

The line center, line strength, and collision- and motion- 
broadened line widths in (26) are respectively given by [16] 


v \j — + ^ij p (28) 



where the z dependence has been suppressed for convenience, 
and * indicates quantities whose values are obtained from the 
HITRAN database. The remaining symbols are /i P : Planck’s 
constant, k&: Boltzmann’s constant, 7V A : Avogadro’s number, 
and molecular mass of the i th gas. 

The equations in this subsection indicate the complicated 
manner in which the atmospheric and surface properties in- 
fluence the spectral intensity measured by the sensor; thus, 
they collectively comprise the forward physics model for the 
atmosphere-surface system. 


The standard technique for solving (20) is the discrete- 
ordinate method of Chandrasekhar [10], [12]; the widely 
used DISORT code is a very clean and efficient numerical 
implementation of this approach [17]. Here, the atmosphere is 
divided into L homogeneous layers; the I th layer has its lower 
and upper boundaries at altitudes z\~ 1 and z*, and its optical 
depth is defined as 

ti{v) = f k(w, C) d£. (32) 

Jzi-i 

The choice for the layer boundaries z\ - as well as L itself - is 
dictated by the available priors and measurements, the required 
retrievals, and computational considerations associated with 
the forward model. We choose to define the atmospheric state 
variables at the layer boundaries, comprising the ( L + 1 )- 
dimensional state vectors 


x* = {&* = &(**)}, b = T,P,Ni. (33) 


We use the 1976 US Standard Atmosphere [12] profiles 
for the prior means and an hyper-parametric Markovian 
structure for S b . 

Instead of (27), we prefer to work with an alternate (Fourier- 
transformed) version of the Voigt profile given by 

J{v\ Vo, 7, 7 ) = r dt. (34) 

J —OO 

This enables the use of efficient FFT routines, as well as 
simplifying the evaluation of forward-model Jacobians. Use 
of (34) and (26) in (32) leads to 


n{v)= f I $(*;C)dC 

7—oo 


A2-jn/t 


where 


$((;*) = £ TV, (*)£>;(*) 


dt, (35) 


(36) 


exp {-i27r Vij(z)t - 2nyij(z)]t\ - {■Kj ij (z)t}' 2 } . 


For the vertical integration in (35), we do away with the 
widely-used hydrostatic law and the van de Huist-Curtis- 
Godson scaling approximation [11], [12]. Instead, we expand 
$(£; z) over 1 < z < zi as 




[T(z) - 7U] (37) 


3$ 

+ 

3$ 

+ dNi 


[P(z) - P,- 1 ] 


The partial derivatives appearing above may be evaluated ana- 
lytically from (36) with (28)-(3I). The vertical profiles are then 
fit by cubic splines [7] - an O(L) operation - for which the 
vertical integrals are also evaluated off-line. A highly accurate, 
and aesthetically pleasing, representation of the atmospheric 


states is thus achieved for a minor computational overhead. 
An FFT operation completes the evaluation of tj. 

DTSORT is then fed with the optical depths of the layers, 
the solar spectral flux, and the surface spectral reflectance, 
and returns the spectral intensity I v {z y 6, <f>) at user-specified 
altitudes and angles. This output is finally used in (19) to 
complete the forward-model evaluation. 

Recently, Spurr has carried out an internal perturbation 
analysis of the discrete ordinate method [18], which has 
led to the LIDORT code that is now available. Although 
still somewhat rough around the edges, LIDORT outperforms 
DISORT by nearly a factor of 10 in intensity calculations, 
and can also be used to generate forward-model Jacobians. 
Here, one uses straightforward chain-rule differentiation on 
the expressions in §III-B to feed LIDORT with the derivatives 
of the layer optical depths ti with respect to the state vectors 
Xfc. (It is possible to perturb the model with respect to the 
surface parameters as well.) We are presently exploring this 
tool, along with automatic code differentiators, in lieu of crude 
external perturbations a la (16). 

IV. Summary and Outlook 

All the ingredients are thus in place for bringing the 
Bayesian inference framework described in §II-D to bear on 
the atmospheric and surface remote sensing problem. Note that 
the popular approach of building pre-computed look-up tables 
has been abandoned in favor of retrieving all the key physical 
parameters directly from the data. This also liberates us from 
the limitations and inaccuracies of standard computational 
techniques such as the correlated-A: method [11], [12]. 

At present, the only inexact aspect of our physical model 
is the scalar treatment of the radiation field. It has been well- 
established in the radiative-transfer literature that polarization 
effects may accumulate to make a sizable contribution to 
the scalar intensity I even in a pure Rayleigh-scattering 
atmosphere; with the addition of clouds, aerosols, and detailed 
surface effects into the forward physics, it becomes imperative 
that we employ a full vectorial model of the radiation field 
in terms of the Stokes vector I [10]. A vectorial version of 
DISORT has been developed recently [19], and efforts are 
under way to construct a corresponding vectorial version of 
LIDORT; these improvements will be incorporated into our 
framework in the near future. 

Our immediate focus is on the application of our compu- 
tational framework to the processing of hyper-spectral images 
from the Hyperion instrument flying on Earth Observing 
One [20]. For this task, we will be using a spatial random- 
field model for the atmospheric states, enabling the use of a 
Kalman filter for processing the pixels as separate (correlated) 
measurements. We are particularly interested in Hyperion 
images gathered over the Southern Great Plains site of the US 
Department of Energy’s Atmospheric Radiation Measurement 
Program [21]. The presence of a large number and variety of 


instruments at this site will supply us with accurate “ground 
truth” for validating our retrievals, as well as providing an 
ideal setting for demonstrating the power of our framework for 
processing heterogeneous data. We thus hope to achieve, for 
the first time, the simultaneous and highly-accurate retrieval 
of atmospheric and surface properties from a single data set. 
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